Golang Job: Site Reliability Engineer **REMOTE**

Job added on

Company

Artech Information Systems
United States of America

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Golang Job Details

Job Title: Site Reliability Engineer - Data Platforms
Location: REMOTE (candidates may be located either in PT/MT)
Duration: 6 months
Pay rate range: $60 - $73/hr.

Must have skills: AWS, Kubernetes, EKS, CI/CD, Linux, Python, Big Data

Understanding of Scala and familiarity with Java is a plus.

Job Description:

Data Platform Services SRE work directly with our partner engineering teams in an embedded SRE model, operating in unison with the developers to deliver seamless experiences for our customers. We run a mix of open source, vendor licensed, and internally developed tools which you will use and have opportunities to improve upon. The cross functional team collaborates to ensure we apply a consistent incident management process across all data platform services and provide user journey based SLOs derived from exhaustive observability metrics, high availability architecture, and automation for deployments. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.

We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Services experience. Our services have to scale globally, stay highly available, and "just work. If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for you!
Key Qualifications

  • Strong sense of ownership and integrity demonstrated through clear communication and collaboration.
  • Proficiency with the architecture, deployment, performance tuning, and troubleshooting of open source data analytics technologies, especially Apache Spark, Flink, AirFlow, and related software in a large scale environment.
  • Experience deploying and managing highly-availably applications on internal and public cloud infrastructure, principally Kubernetes.
  • The ability to design, author, and release code in languages like Go, Python, or Java
  • Acute drive to automate manual operations and to improve them through repeated iteration.
  • Understanding of the Linux Operating System, standard networking protocols, storage, and databases
  • Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker)
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
  • Excellent troubleshooting and problem solving skills.
  • Experience with scale testing, disaster recovery, and capacity planning
  • Ability to participate in our 24x7 weekly on-call rotation.

Skills

  • Big Data Processing, Apache Spark, Flink, or related technologies
  • Kubernetes or Amazon EKS
  • Python, Golang, and/or Java comprehension and development experience
  • Infrastructure as code orchestration tools, such as terraform or pulumi
  • High-Availability Architecture
  • Automating manual processes and CI/CD concepts
  • Troubleshooting in mission-critical production environments Agile


Education or Experience BS/MS in Computer Science or Equivalent (5+ years of software development or production operations experience in a large-scale environment)